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A Greek weight associated to a parameterized random variable 
Z{\) is a random variable tt such that V xE[(p{Z(\))] = £[0(^(A))7r] 
for any function (j). The importance of the set of Greek weights for 
the purpose of Monte Carlo simulations has been highlighted in the 
recent literature. Our main concern in this paper is to devise methods 
which produce the optimal weight, which is well known to be given 
by the score, in a general context where the density of Z{X) is not 
explicitly known. To do this, we randomize the parameter A by intro- 
ducing an a priori distribution, and we use classical kernel estimation 
techniques in order to estimate the score function. By an integration 
by parts argument on the limit of this first kernel estimator, we define 
an alternative simpler kernel-based estimator which turns out to be 
closely related to the partial gradient of the kernel-based estimator 
ofE[0(Z(A))]. 

Similarly to the finite differences technique, and unlike the so- 
called Malliavin method, our estimators are biased, but their imple- 
mentation does not require any advanced mathematical calculation. 
We provide an asymptotic analysis of the mean squared error of these 
estimators, as well as their asymptotic distributions. For a discontin- 
uous payoff function, the kernel estimator outperforms the classical 
finite differences one in terms of the asymptotic rate of convergence. 
This result is confirmed by our numerical experiments. 

1. Introduction. Let A be some given parameter in W^, and define the 
function 

V^{X) :=E[<^(Z(A))], 

where Z(-) is a parameterized random variable with values in M" and (j) : — > 
M is a measurable function. In many applications we are interested in the nu- 
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merical computation of the function V'^{X) for some parameter A'', together 
with the sensitivities of with respect to the parameter A. 

In particular, in the financial literature, represents the no-arbitrage 
price of a contingent claim, defined by the payoff (j){Z(X)), in the context of 
a complete market with prices measured in terms of the price of the nonrisky 
asset (so that the model is reduced to the zero-interest rate situation). The 
sensitivities of with respect to the parameter A are called Greeks, and 
are widely used by the practitioners in their profit and loss analysis. In 
the context of the Black-Scholes model, the derivative of the option price 
with respect to the current underlying asset price is the so-called Delta, and 
represents the number of shares of risky asset to be held at each time in 
order to realize a dynamic perfect hedge of the option. The Gamma is the 
second derivative of the option price, with respect to the underlying asset 
price. It is an indicator of the variation of the hedging portfolio. Another 
important Greek is the so-called Vega (although not a Greek letter!) which 
is the derivative of the option price with respect to the volatility coefficient 
(see, e.g., Hull [12] for more details). 

We observe that the case of Bermudean options (i.e., American options 
with finite possible exercise times) is included in our framework by taking 
(j){Z(X)) as the value of the option at the first possible exercise time. The 
case of American options (with a continuous set of exercise times) can be 
covered by a limit argument, but requires a small time asymptotic analysis 
in order to control for the stability of the variance; see [8] . 

Given a numerical scheme for the computation of the function V^, the 
first natural idea for the numerical computation of the Greeks is the finite 
differences approximation of the corresponding derivative. In addition to the 
generic standard error on the numerical computation of the expectation, this 
approximation leads to a biased estimator at a finite distance and appears to 
be inefficient for discontinuous payoff functions (p. We refer to L'Ecuyer and 
Perron [7], Detemple, Garcia and Rindisbacher [5] or Milstein and Tretyakov 
[14] for a theoretical analysis of the rate of convergence of this estimator. Two 
direct methods for computing the Greeks have been presented by Broadie 
and Glasserman [3]: (i) the pathwise method, which consists in differentiat- 
ing the random variable (f){Z{X)) inside the expectation operator, and (ii) 
the likelihood ratio method which reports the differentiation on the distri- 
bution of Z{X). The first method requires the computation of the gradient 
of the payoff function (p, which is a serious limitation in practice as (j) is 
typically highly complicated or even not differentiable; see also Giles and 
Glasserman [6] for further developments in this direction. As for the second 
method (ii), it was (apparently) restricted to the very special cases where 
the distribution of Z{X) is known explicitly. This difficulty was overcome by 
Fournie, Lasry, Lebuchoux, Lions and Touzi [9, 10] who exploited the Malli- 
avin integration- by-parts formula to show that, for smooth random variables 
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z{-), 

(1.1) VAE[</>(Z(A))]=E[<A(Z(A))7r], 

where vr, the so-called Greek weighty is a random variable independent of the 
pay-off function (p. A quick overview of the notion of Greek weights is re- 
ported in Section 2. Further developments of the results of [9] were obtained 
by Gobet and Kohatsu-Higa [11]. The comparison of the above different 
methods is available in the survey paper of Kohatsu-Higa and Montero [13]. 
An important observation is that the set of Greek weights which satisfy 

(1.1) is a convex set of random variables. By an easy variance reduction 
argument, it is easily seen that the score vr* := ln/(A'', Z(A'')) minimizes 
Var[(/!)(Z(A))7r], whenever the density f{X,z) of the random variable Z{X) 
exists and is sufficiently smooth. In general, the use of the Malliavin calculus 
does not lead to this optimal Greek weight, except in trivial cases where the 
density f{X,z) is explicitly known, which corresponds to the case covered 

by [3]. 

The main purpose of this paper is to focus on the use of the optimal 
Greek weight in order to estimate the corresponding Greek by the Monte 
Carlo method. To do this, our main idea is to randomize the parameter A 
and to rewrite as a regression function: 

V*{X):=E[^{Z{A))\A = \], 

where Z{A) is a random variable with density ip{X,z) := £(A'^ — X)f{X,z), 
and ^(A'^ — •) is some given randomizing distribution on the parameter A 
around A''. In other words, the random variable Z{A)\A = X has the same 
distribution as the random variable Z{X) defined by the density f{X,z). We 
next assume that our observations consist of a family {(Aj, Zi),l < i < N} 
of independent pairs (Aj, Zj) drawn in the density ip, and we define various 
kernel estimators of the Greek 

(1.2) V,E[HZiX))]\x=xo = miZiX'))s{X', Z(AO))], 

where s{X,z) := V xln f{X, z) is the score function. The first natural idea is 
to notice that 

(1.3) E[0(Z(A°))s(A°,Z(A°))]=E[c/>(Z(A))s(A,Z(A))|A = AO], 

which is a usual regression function. Thus, a two-steps estimation method is 
proposed: we first perform a kernel-based estimator s of the score function, 
and then we define a kernel regression estimator of the Greek by substituting 
s to s. In the sequel the resulting estimator is referred to as the double 
kernel-based estimator and is denoted by f3. 

Our next kernel estimator of the Greek is based on a convenient integration- 
by-parts in (1.2). This leads to a much simpler estimator /? which turns out 
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to be closely related to the estimator /?, obtained by direct differentiation 
of the classical kernel regression estimator of V'^{X) = E[i^(Z(A))|A = A'^]. 
These two estimators will be referred to as the single kernel-based estima- 
tors. 

Let us observe that, unlike the so-called Malliavin Greek technique, our 
suggested estimators are biased but do not require any advanced mathe- 
matical calculation for their implementation. These two features are shared 
with the finite differences method. Also, intuitively, the randomization of 
the parameter A introduces an additional noise which may imply that our 
estimators are less accurate than their classical competitors. Our numerical 
results show indeed that the Malliavin Greek estimators are by far more 
accurate even in the case of an Asian option where the Greek weight is 
not optimal. Therefore, the main purpose of this paper is to provide a deep 
comparison between our estimators and the ones of finite differences. 

Our three suggested estimators are defined precisely in Section 3, and 
their asymptotic properties are discussed in Section 4. We show that (3 and 
are asymptotically equivalent. The asymptotic properties of (3 are derived 
under stronger conditions on the pay-off function (j) and the kernel functions. 
The simultaneous choice of the bandwidth and the number of observations 
is also more restrictive in the latter case. 

An important observation is that the two single kernel based estimators 
coincide if and only if the randomizing distribution £ is a truncated expo- 
nential distribution. In this case, by conveniently relating the support of the 
truncated exponential distribution to the kernel bandwidth, we observe that 
the rate of convergence is independent of the dimension of the parameter 
A. We next solve the optimal choice of the randomizing distribution within 
this class by minimizing the corresponding mean square error. 

Our asymptotic results imply the following main property of the single 
kernel based estimators: for a discontinuous payoff function (p, the asymp- 
totic rate of convergence of our estimator is better than the classical finite 
differences one, whenever the order of the kernel function is larger than some 
explicit threshold. In the case of a truncated exponential randomizing dis- 
tribution, with support related to the kernel bandwidth, the single kernel 
based estimator has a better asymptotic rate of convergence whenever the 
order of the kernel function is larger than four. 

Some numerical results are reported in Section 5. We estimate the delta 
of an European and an Asian digital call option. Our experiments show that 
the Malliavin-based estimators defined in [9] or [3] are the most efficient, 
as documented by the previous literature. As predicted by our theoretical 
asymptotic results, the single-kernel based estimator outperforms the finite 
differences one, but this is only observed for a large number of simulations. 
We believe that this does not restrict the interest in our new suggested 
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method as this is just a matter of computer power, and the required num- 
ber of simulations can be significantly reduced by using variance reduction 
techniques. For instance, the technique of antithetic variables applied to the 
randomizing density appears to be very efficient. 

2. The Greek weights set. Throughout this paper we consider a complete 
probability space {Q,J-,P). Let Z{X) be some random variable, valued in 
M", depending on some finite-dimensional parameter A G M'^, and set 

V'^iX):=E[(j){Z{X))] for (/)GL°°(R",M). 

In order to simplify the presentation, we shall focus our attention on some 
fixed particular value A*^ of A, and we denote 

Z°:=Z(A°). 

The chief goal of this paper is to devise efficient methods for the computation 
of the sensitivity parameter 

/3°:=VaI^^(A°), 

for arbitrary functions (f) chosen from a suitable large class. We assume 
that the distribution of Z(X) is absolutely continuous with respect to the 
Lebesgue measure, and we denote by /(A, z) the associated density, that is, 

E[0(Z(A))] = J (t){z)f{X,z)dz for all (/> G L°°(M",R). 

Under mild smoothness assumptions on the density /, we directly compute 
that 

VxV^iX') := ^(A°) = E[cl,{Z')S% := s{X^ Z°), 

where the function s is independent of (j) and is explicitly given by 

siX,z) :=VA{ln/(A,2)}. 

This idea was introduced by Broadie and Glasserman [3] in the context of 
the Black-Scholes model where the density f(X,z) is explicitly known. 
We shall always assume that 

(2.1) E|5°p<oo. 

Under this condition, the set 

W.= {tt£ l2(17,R'^) : VAy*(A°) = ^[0(Z°)7r] for ah G L°°(]R", M)} 

is not empty. From the arbitrariness of (/> G L°°(M",M), it is immediately 
seen that 

W = {TTeh\n,R'^):E[TT\Z°]=S^}, 
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and therefore. 



> E[(f){ZyE[7r\Z'^]E[7r\Zy] - VF<^(AO)Vy'^(A°)' 
= ^[0(^0)25° 5°'] - V1/'^(A°)Vy<^(A°)' 
= Var[(/)(Z°)5°]. 



Hence 



5° 



G yV is a minimizer of Yai[cl){Z )7r] 



Throughout this paper we call the optimal Greek weight. When the 
density function f{X,z) is not known, it was suggested in [9] to obtain 
(inefficient) Greek weights from the set W by exploiting the integration 
by-parts-formula from Malliavin calculus. Our main objective is to derive 
Monte Carlo estimators of the Greek value P^, which asymptotically achieve 
the minimum variance, by using methods from nonparametric statistics to 
approximate the above optimal Greek weight S^. 

3. Kernel estimation and optimal Greek weight. 

3.1. Randomization of the parameter. The main idea of this paper is to 
randomize the parameter A in order to estimate the Greek by the classical 
kernel estimation technique. This randomization can be exploited from two 
viewpoints. First, one can use it in order to estimate the optimal Greek 
weight, that is, the score function. An alternative viewpoint is to take ad- 
vantage of the smoothness of the randomizing distribution in order to obtain 
an integration by parts formula similar to the Malliavin integration by parts 
technique. This technique is well known in the nonparametric statistics lit- 
erature; see, for example, [1]. 

Let £ : M.'^ — > M be some given probability density function, with support 
containing the origin in its interior, and set 



where A is the parameter of interest. We consider a sequence 

(3.1) (Aj, Zj)i<i<iv of independent r.v. with distribution ip{X,z), 

so that, for any i < N, i[X^ — •) is the density of A* and /(A*,-) is the 
conditional density of Z^ given A*. 

Remark 3.1. Notice that the simulation of (Aj, Zj)j>i can be performed 
easily even in cases where the density (p can not be written explicitly. This 
applies typically to the case where Z{X) = Xt{X), for some integer T, where 



^{X,z):=i{X'-X)f{X,z) 



for XeR'^ and z£ W 



n 
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{Xt{X),t G N} is a Markov chain with given transition density. Then, for a 
given value of A, the simulation of Z is easily feasible by usual methods. 
However, the marginal distribution of Z{X) is typically very complicated so 
that it is useless for the numerical computation of the score function s(A, z). 

In this section we provide various estimation methods of (3 based on non- 
parametric kernel methods. We then introduce the kernel function 



K-.R" — >R with J K = l, 
whose precise properties will be detailed at the beginning of Section 4. 

3.2. A first kernel estimator of the Greek. The main idea is that the 
optimal weight requires a priori the knowledge of the probability density 
function f{X,z) and the associated score function s{X,z). Indeed, if these 
functions were explicitly known, then a natural nonparametric estimator of 
the Greek /3 using the observations (3.1) is 

1 ^ /A°-A- 

Although s is not explicitly known in our applications of interest, one could 
approximate it by means of an additional kernel estimator based on another 
kernel function K defined on R". We therefore introduce our first kernel- 
based estimator of /3: 

1 A /aO-A 



where s^* is an approximation of s given by 
%'(A,2) := ^Ini ^ 



dX \£{X° - X){N -l)h'^+'' 



N 



(3.4) X y: ^(^T^)^ 



and 



(3.5) <^-(A,z):=^^ J2 K 



^'~\X z) I ^^(^-^°) 



h 
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(3.6) 



-d—n—l 



N -I 



^ /A-A,\ 



h 



From a practical point of view, this estimator displays two drawbacks. First, 
its expression involves a product of two (possibly multidimensional) kernels 
K and H. Thus, it suffers from the so-called "curse of dimensionality." 
Moreover, its calculation is time-consuming. In the subsequent subsections 
we introduce two alternative kernel estimators of /3, which involve a single 
kernel function and a single summation. 

From a theoretical point of view, we shall see that this estimator achieves 
the same rate of convergence as the two following ones but requires more 
stringent conditions, and involves heavy calculations. 



3.3. A simpler kernel estimator of the Greek. We continue our discussion 
under the condition that 



(3.7) 



the kernel function K has compact support. 



We still consider the natural estimator given by (3.2). For fixed /i > 0, it 
follows from the law of large numbers that 



(3.8) 



W— >oo 



E 



(l){Z)s{k,Z)K 



A°-A 



P-a.s., 



where (A,Z) is a random variable with distribution ip{\,z). Recalling the 
definition of s, and integrating by parts with respect to the variables Ai, . . . , A^, 
we see that, for /i > sufficiently small, 

•AO 



■N\ 



1 

Z(o)F 

^(0) 



{z)K 



A 



h 



£(AO-A)Va/(A,z) dXdz 



\(vk( 



A" -A 



V h 
./AO 



^(0)/i'^+ 



-E 



+ hK 
cI^{Z){vkI^ 
+ hK 



h J i 
AO-A^ 



(AO- A) ]f{X,z)dXdz 



h 
AO 



(AO - A) 
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where we used (3.7). This suggests the fohowing simpler kernel estimator of 
the Greek /3: 

The asymptotic properties of /Jtv will be provided in Section 4. 

3.4. Differentiating the kernel estimator of the price. We next start out 
from the natural kernel estimator of the price V'^{X): 

Differentiating V^(A) with respect to A, we obtain our final kernel estimator 
of the Greek: 

(3.10) 

Observe that our two estimators 13^ and 13 n are closely related by 
In particular, 

Pn = Pn whenever id ^ e^^-^^'-^lsil) 

(3.11) 

is a truncated exponential distribution, 

for some parameters ao £ 1^, oi £ IK'^ and some subset B of M*^ containing 
the origin in its interior. 

The asymptotic properties of this third estimator will also be provided in 
Section 4. 
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4. Asymptotic results. We now compare the estimators defined in the 
previous section from the viewpoint of their asymptotic distributions. The 
heavy asymptotic analysis of the double kernel-based estimator is reported in 
[8] , where it is shown to have the same asymptotic rate of convergence as the 
single kernel-based estimators, under more stringent conditions. Since the 
practical implementation of the double kernel-based estimator is, in addition, 
more time consuming, the discussion of this section will focus on the single 
kernel-based estimators. 

We shall first show that the two single kernel-based estimators have equal 
asymptotic rates of convergence. Then, we derive the optimal choice of the 
number of simulations N and the bandwidth h of the kernel function K, 
by using the classical mean square error minimization criterion. We next 
specialize the discussion to the case of a truncated exponential randomizing 
distribution (3.11) with support defined by B := [— e,e]'^. In this setting, we 
observe that the rate of convergence of the kernel estimator is independent 
of the dimension of the parameter A for some convenient choice of e in 
terms of the bandwidth h. Finally, we discuss the optimal choice of the 
randomizing density £ within the class of truncated exponential distribution, 
and we provide a quasi-explicit characterization of the optimal truncated 
exponential randomizing distribution in the sense of the mean square error 
criterion. 

Before stating our results, we recall that the order of the kernel function 
K is defined as the smallest nonzero integer p such that there exist some 
integers (ji, . . . , jp), jfe € {1, . . . , d}, such that 

J la^... larK{l) dl = for < r < p, Ofc e {1, . . . , d} 

and 

I lj,...l,^K{l)dl^O. 

Typically, if K is the product of d even univariate kernels, then it is of order 
p = 2 (at least). The regularity hypothesis on the kernel function K will be 
the following. 

Assumption K. The kernel function K : M'^ ^ M is C^, compactly sup- 
ported, satisfies J K = 1, and is of order p>2. 

In the subsequent subsections we shall use the notation 

(4.1) e^[^](A,z):=i^ f: {jh.---h.mdi)y%^...>.j{\z) 

iiv,ip=i 

for every smooth function if) defined on x M". We shall also denote A® := 
A A' for every matrix A. 
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4.1. Asymptotic results for the single kernel-based estimators. Our first 
result requires some regularity conditions on the density functions / and i. 

Assumption R1. For every z, the functions f{-,z) and £ are p + l times 
differentiable, and for every integer i < p, the function A i — > V\{£{X^ — 
-^)Va/(A,z)} is continuous at A'' uniformly with respects to z G 5, for some 
subset S s.t. Supp((/)) C int(5). 



Proposition 4.1. Under Assumptions K and Rl, as N ^ oo and h ■ 
0, the bias and the variance of [3n satisfy 

(4.2) E[/3^]-/3~Ci/iP and Var[/37v] ~ 



where 



(4.3) 



^1 - 7^ / ^^[^(^° - •)/a](A°,z)</'(z)c?z and 



m 
m 



Proof. By definition of (3n, we have E0m] = ^if^N]- By (3.8), this 
provides 

m := E0r,] = J Hz)£{X' - A) Va/(A, z)K (^^) dz 

= ^ j <P{z)£{hl)fx{\'' -hl,z)K{l)dldz. 

Clearly, ■0(0) = / (t){z)fx{\^.,z)dz = 13. Moreover, since K has compact sup- 
port, it follows from Assumption Rl that the function tfj is p times differen- 
tiable at zero, with derivatives obtained by differentiating inside the integral 
sign, so that its ith iterated derivative denoted tp^^^O) are given by 



X: (/ l,,...l^Ml)dl)(yj </'(^)[V\^.,,...,A,J^(A°--)/A}](A°,z)dz) 



^(0) ,^ 

for every 1 < i < p. Since p is the order of K, observe that (0) = for 
every 1 <i <p, so that a Taylor expansion of ijj provides the first part of 
the proposition. 

As for the variance, we directly compute that 

Var[/3.]- 



iV/l2d+2^(0)2 
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A0-A\ 



Hr(A^-A) 



h J I 

/ /A0-A\ /AO 

V2 :=E 

By a similar argument as in the first part of this proof, we compute that 
vi = h'^ 4)^[z) iVK{l) + hK{l) — {hl)\ i{hl)f{\^ - hi, z) dldz 

^hH{Q)(^j VA'(/)®(i/^E[(/.2(Z°)]. 
The required result follows by observing that V2 = 0{h'^^^). □ 
We are now ready for our first main result. 

Theorem 4.1. (i) Let the conditions of Proposition 4.1 hold, and as- 
sume that 



(4.4) 



h — >0 and Nh'^+'^ — > oo as N —>■ oo. 



Then, with S as in (4.3), we have ^Nh^^+^^N -^Pn]) — > 7^(0, S) in 
distribution. 

(ii) In addition to the above conditions, assume that 
(4.5) iv/i'^+2+2p — , as ^ oo. 

Then the bias vanishes and ^/Nh/^{(3iy — (3) — > AA(0, E) in distribution. 

Proof. We shall prove this result by verifying the Lyapounov conditions 
(see, e.g., Billingsley [2], page 44). Let a be an arbitrary vector in W^, and 
define, for every i = 1, . . . ,N , 



-I 4 



Nhd- 

Xf:=a'{Y,^-nYr])- 



^^(Z.)(vi.(^)+.i.( 



A° - A,- \ V£ 



-(A°-A,0 



In view of Proposition 4.1, the only condition which remains to check in 
order to verify the Lyapounov conditions is the existence 5>2 such that 



1 ^ 

(4.6) sup — ^E[|Xf 1"^] < +00 where := Var 



N 



i=l 



TV 
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In order to prove (4.6), we start by observing from (4.2) that 



l^—&[<t>\z'^)] J \a'VK{l)fdL 



We next estimate by the Minkowski inequahty and (4.2) that 
\\X^\\s<\\a'Y/'\\5 + \a'E[Yf]\ 

= \\a%''\\5 + ^\anPN]\ 

^ ^^MiZ)a^{V,K{{X^ - A)/h) + hK{{X^ - A)/h)Vd/i{X' - A)) h 



< Const I T—r H 

where the last estimate is obtained by a Taylor expansion with respect to 
the h variable, in the neighborhood of the origin, following the method used 
in the proof of Proposition 4.1. Hence, 

— VeHX^I'^I < Const N - fA^/i^+^l'^/^ < '^'^''^^ 

< hi {Nhd+^Y^ > - (W)(^-2)/2' 

and condition (4.6) is satisfied when Nh'^ — > oo, as assumed in (4.4). There- 
fore, J2iLi is asymptotically Gaussian, with a variance matrix given 
by cj^. By the arbitrariness of a G M*^, the required result follows from the 
Cramer- Wold device; see, for example, Theorem 25.5, page 405 in [4]. □ 

We next turn to the estimator /? which was defined as the gradient, with 
respect to A, of the kernel based estimator V^{X) of the function V^(A). The 
asymptotic properties of this estimator are obtained by following the tech- 
niques of the previous proofs and require the following regularity condition 
on the densities / and i. 

Assumption R2. For every z, the functions /(•, z) and i are p + 1 times 
differentiable, and for every integer i < p, the function A i — > V\{£{X^ — 
X)f{X,z)} is continuous at A'' uniformly with respect to z £ S, for some 
subset S s.t. Supp((/)) C int(5). 

Proposition 4.2. Under Assumptions K and R2, as N ^ oo and h 
0, the bias and the variance of (3n satisfy 



]E[/37v] -/3~C2/i*' and Var[/3jv] 
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where T, is given by (4.3) and 

Proof. The proof is essentiahy similar to that of Proposition 4.1. Recah 
that the estimators /?7v and /J^v are related by 

We start by analyzing the bias term. Recall from the proof of Proposition 
4.1 that 

E[/3^] = -^ / (l){z)l{hl)fx{\^ -hl,z)K{l)dldz. 



m 

We then deduce from (4.7) that 

nM = J^I H^) ('/'a(A° - hi, z) + ^(0)(^(A° - hi, z)^ K{1) dl dz. 

We now observe that Assumption R2 allows to derive an expansion in the 
h variable of the above expression, near the origin, up to the order p. The 
coefficients of the expansion are obtained by simple differentiation inside 
the integral sign. Finally, since p is the order of the kernel K, it is easily 
seen that the coefficients of /i*, i <p, in this expansion vanish, and the only 
nonzero coefficient is that of h^. 

The variance of $n is also treated by the same argument as in the proof 
of Proposition 4.1, and the dominant term in the expansion of the variance 
is easily seen to be the same as in that proof. □ 

Proposition 4.2 says that and 0^ have the same asymptotic variance, 
and the orders of their asymptotic biases are the same. Our next result states 
that these two estimators have exactly the same asymptotic distribution. 

Theorem 4.2. (i) Let the conditions of Proposition 4.2 hold, and as- 
sume further that (4.4) holds. Then, with S as in (4.3), we have 
^[Pn]) — >AA(0,S) in distribution. 

(ii) Let (4.5) hold, in addition to the above conditions. Then the bias 
vanishes and 



VWh^iPN - P) — > AA(0, S) in distribution. 
Proof. Define the sequence 

and follow the lines of the proof of Theorem 4.1. □ 
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4.2. Optimal choice of N and h. The two single kernel-based estimators 
(3j\f and have similar asymptotic properties. They both have a bias of 
order /i^, a variance of order l/{Nh'^~^'^) and a convergence in distribution 
at the rate V Nh'^^'^ . Therefore, the determination methods of the optimal 
N and h will be similar for both of them, and we only detail calculations 
for the estimator fd^. Let the conditions of Proposition 4.1 hold. Then (4.2) 
holds, and we calculate an asymptotic equivalent for the mean square error 
between f3N and /3: 



MSE(/3jv):=lE[|/37V-/3p] 



Minimizing the MSE in h, we get the asymptotically optimal bandwidth 
selector: 

^ ^ (d + 2)TV(S) V/('^+^P+^) 

Note that h is of order iV-i/('^+2p+2) ^ leading to an MSE of order iV-2p/(rf+2p+2) ^ 
Similarly, the asymptotically optimal bandwidth selector for /?jv is 

,_^ (d + 2)TV(S) ^W^+^) 
^ ^"l 2p|C72|2Ar ) 

These results imply asymptotic theoretical choices for h relative to N, but 
we may still encounter difficulties in the numerical calculation of h. Even if 
the optimal order of h were known, we still need to evaluate the associated 
constant coefficients. From our empirical experiments, we observed that the 
accuracy of our estimators depends heavily on the choice of the bandwidth 
h, as usual in kernel estimation. 



4.3. The case of a uniform randomizing distribution. We first study fur- 
ther the case where the randomizing density is uniform on the sphere of M'^ 
centered at with radius e: 

Observe that this is a particular example from the truncated exponential 
distributions (3.11) for which the single kernel density estimators coincide: 

^~ = ^« = i|^E*(^.)VA-(^). 

1=1 

Without loss of generality, we assume that the kernel K has support on 
[—1,1]'^. We first rewrite Assumption Rl in the setting of this section. 
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Assumption R3. For every z, the function f{-,z) is p + 1 times dif- 
ferentiable, and for every integer i<p+l, the function A i — > V\/(A,z) 
is continuous at A'' uniformly with respect to 2; S S, for some subset S s.t. 
Supp((^) C int(S'). 

Proposition 4.3. Let Assumptions K and R3 hold. Then, as N ^ 00, 
h^O and e ^ with €>h, we have 

(4.10) E0n] - P ^ CuhP and Var[/3jv] ~ iV^/i'^^'^e'^S^, 
where 

Cu:= f C?^[/A](A°,2)</>(z)dz and 

(4.11) 

S„:=2^E[(/)2(Z°)] / VK®. 

Proof. The proof is similar to the one of Proposition 4.1. Denoting by 
Id the vector of M'^ with unit component, we rewrite 

If ( /•^"+^id /A" - A\ \ 
'^l""! = 1^LJ(4L.. VA-(— )/(A,.)dA) 4. 

= T I <t>{z)( I VK{u)f{X^ -uh,z)du]dz. 

h Jr" VJ[_e/h,,£//i]d / 

Since e>h and K is supported on [— 1,1]*^, we may replace in our last 
term the integration on [— ^, ■|]"' by an integration on W^, which is necessary 
to get the convergence of our estimator to (3^. Then, as in the proof of 
Proposition 4.1, an integration by parts followed by Taylor expansions gives 
us the expected equivalent of the bias. The same argument applies for the 
computation of the variance of (3n- D 

Sending e to zero, we obtain the same asymptotic properties as in Propo- 
sition 4.1, as long as e > /i. Therefore, the asymptotic optimal e is simply the 
bandwidth h. The kernel-based estimator (3fj associated with this optimal 
uniform density ^ is then given by 

(4.12) «i:=A.i:,(Z.)Vi.^ . 

1 = 1 

and satisfies 



(4.13) E[/3]^] and Var[/3]^] ~ TV-^/i-^S, 
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with Cu and S„ defined in (4.11). Minimizing the corresponding mean square 
error, we obtain the optimal bandwidth 

TrS„ \i/(2p+2) 



(4-14) h-:-^, ,„ ,2,, 

As in the study of the previous estimators, we also obtain a central limit 
theorem for the estimator (3^^. 

Theorem 4.3. (i) Let the conditions of Proposition 4.3 hold in the par- 
ticular case where e = h, and assume further that 

(4.15) h — >0 and Nh^ — > oo asN^oo. 

Then, with S„ as in (4.11), we have VnJ^ 0^ - — > AA(0,S„) in 
distribution. 

(ii) If, in addition, Nh'^^'^'^ 0, then the bias vanishes and 
^/Nh?0N -P) — ' -^(0> ^u) in distribution. 

A remarkable feature of the above asymptotic result is that the rate of 
convergence is independent of the dimension d of the parameter A''. 

4.4. The case of a truncated exponential randomizing distribution. In 
this subsection we specialize the discussion to the one-dimensional case, and 
we consider a truncated exponential randomizing distribution: 

ei 

^W:=^^^73^1[-.e](0, 

with the parameter G M, so that the two single kernel estimators associated 
to this density coincide: 

Using the same line of arguments as in Proposition 4.3 , we see that, under 
Assumptions K and R3, as — > oo, /i — > and e ^ with e>h, we have 

(4.16) K[/3n] - l3 ^ CehP and Var[/3iv] ~ A^~^/i~^eSe, 
where Eg := defined in (4.11) and 

P+i 



(4.17) 

\p~k+l 



X (^jyif{\'^,z)4>{z)dzy-9y 
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Again, the asymptotic optimal e is simply the bandwidth h and the kernel- 
based estimator (3j^ associated with this optimal exponential density is given 
by 

The optimal bandwidth is obtained by minimizing the corresponding mean 
square error: 

TrS \V(2p+2) 
(4.19) ' ^ ^ 



which leads to the following MSE: 

(4.20) MSE{p%) = 2(p+ l)p-P/(P+i)[|Ce|2(TrSe)^]^/(P+^)iV-P/(*'+^). 

As in Theorem 4.3, a central limit theorem for the estimator can be 
derived. 

Remark 4.1. From the asymptotic viewpoint, the estimators based on 
the truncated exponential randomizing density differ by their bias, as the 
constants Cg depends on 9 while the variance Sg = is independent of 6. 
The optimal truncated exponential randomizing density is then obtained by 
minimizing the squared bias, defined by the polynomial function Cg, with 
respect to 6. In our numerical experiments of Section 5, this minimization 
is performed by classical Newton-Raphson iterations. 

Remark 4.2. Notice that, in both cases, the choice of the radius e of £ 
depends on the kernel function K only through its support. For instance, if 
supp(i(') = [—M, M]'^, then the optimal radius is e = Mh. 

4.5. Comparison with the finite differences estimators. We first start by 
recalling the finite differences estimators. For ease of presentation, we let 
d = l. The finite differences estimator of the parameter /S" := VaIE[i;^(-Z'(A''))] 
is based on the finite differences approximation of the gradient 

VaE[c/)(Z(aO))] ~ + «g))] - nnZjX'' - (1 - a)e))] ^ 

where e > is a "small" parameter, and a £ [0, 1]. The values a = 0, 0.5 
and 1 correspond respectively to the backward, centered and forward finite 
difference. The above finite difference approximation suggests the following 
finite differences estimator of /3: 

1 ^ 

f^N^" = Jf^T.(<t>[Z\\' + as)] - ^[Z\X' - (1 - a)e)]). 
1=1 
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The asymptotic properties of these estimators were first studied by L'Ecuyer 
and Perron [7]. In the case where A ^ 0[Z(A)] G Cl{R'^), when ^ oo and 
e — > with N^/^e 0, they obtained a parametric rate of convergence: 

Vn {I3JP -p) — > Af{0, Sq) in distribution, for a = i and 1. 

When the payoff function (p has a countable number of discontinuities, De- 
temple, Garcia and Rindisbacher [5] obtained the following central limit 
theorems: 

For a = I , when iV^/^e 0, 

N^^HPJF -13) — > AA(0, S„) in distribution. 

A''^oo 

For Q = 0,1, when TV^/^e ^ 0, 

N^'^CfiF -13) — > AA(0, S^) in distribution. 

Af^oo 

In the general case d > 1 , the finite differences estimators are defined com- 
ponentwise, and therefore, the rate of convergence is not affected by the 
dimension d of the parameter A*'. 

The main objective of this paragraph is to provide an asymptotic com- 
parison of the single-kernel based estimator with the finite differences one. 
The key point of our single-kernel based estimators is that the differentia- 
tion with respect to the parameter A is reported on the density of Z[\) so 
that our asymptotic results do not involve the regularity of the pay-off func- 
tion (f). For any pay-off function (/), and when _/V/j'^+2p+2 — ^ -^g derived in 
Theorems 4.1 and 4.2 that 

\^Nhd^0N-f3) — > N{0,T,) in distribution, 

TV^oo 

where p is the order of the kernel function. Minimizing the corresponding 
MSE, we obtained in Section 4.2 an optimal h of order A^~^/{'^+2p+2) which, 
of course, almost satisfies the condition required in the convergence in dis- 
tribution. Therefore, taking a bandwidth h of order ]Si-'^/i'i+'^P+'^)-'^s/{d+2) 
with (5 > sufficiently small leads to a convergence in distribution at rate 
with r := p/{d + 2p + 2) — 5 > 0. Therefore, the single-kernel based es- 
timators, with kernel of order p > 2d + 4 and 5 sufficiently small, achieve 
a convergence rate of order r > 2/5. Hence, they outperform all the finite 
differences estimators in the case of discontinuous payoffs. 

Notice that, by taking kernel functions of order p sufficiently large, we can 
obtain a convergence rate in distribution as close as desired to the parametric 
rate \/iV. 

Remark 4.3. Consider the optimized kernel estimators (3^ and 
based on uniform or exponential density i on the sphere with radius h, 
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as derived in Section 4.3. Tlien, for Nh?'^'^'^ — > 0, we obtain a rate of con- 
vergence of V Nh?. Therefore, in order to outperform the finite differences 
estimators of a Greek associated to a discontinuous payoff function (/>, one 
just needs to use a kernel function of order p> A. 



5. Numerical results. In this section we present some numerical results 
obtained in the Black-Scholes model: 



:= j;exp 



t>0,x>0, 



where is a standard Brownian motion on (r2,jF, P) with values in M, and 
r € M, (T > are two given constants. We focus on the estimation of the 
so-called Delta: 

/3:=V.E[(/)(Z")], 

where = Sj^ for an European option and = Sf dt for an Asian 
option. As in the previous sections, we denote by /(x, •) the density of Z^ . 

We simulate independent observations Xi distributed in the (optimal) 
exponential randomizing distribution ^ on the sphere centered at Sq = x 
with radius /i, as derived in Section 4.4. The single-kernel based estimator 
is therefore given by (4.18). 



5.1. Computation of the optimal bandwidth. As the "bumping" parame- 
ter e for the finite differences estimator, the bandwidth in kernel estimation 
needs to be chosen carefully. The asymptotic results of Section 4 provide 
the expression of the asymptotic optimal bandwidth. For the truncated ex- 
ponential randomizing distribution, we obtain 



h^ 



\pClN 



l/{2p+2) 



where Eg = 2E[02(Z^)] J(VK)^ and 

(-ly 



p 



p+l 

vPK{u) du ) 



k=l 



P 
k-1 



E 



(Z'^' 



^'jix,Z- 
fix,Z-) 



-k+l 



Given a kernel function K, the coefficient Sg can be estimated by a standard 
Monte Carlo procedure. We next focus on the estimation of the parameter 



V^/(x,Z" 
f{x,Z-) 



for a given k £ {1, . . . ,p+ 1}. 
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(i) Let = Sj^ = xe^ , where Y has a normal distribution with mean 
m:= (r — and variance S := cr^T. Then it is easily checked that 



where 
(5.1) 



i=0 



d{x, z) := 



In z — In X — m 



and the coefficients (o^ )(ij)G{o,...,A:}2 are given by 



(5.2) 



L{i=0}7 



i + 1 



with the convention a-f =0 for f < and i> j. Hence, 



<P{Z-)iY,aU{x,Z' 



\i=0 



and this parameter can be estimated by a straightforward Monte Carlo pro- 
cedure. 

(ii) In practice, the distribution function is unknown, and the calculation 
of the previous paragraph can not be used to estimate £k- We suggest to 
mimic the same principle as the usual Silverman's rule-of-thumb in kernel es- 
timation (see Scott [15], e.g.): let m and S be two given estimates of the mean 
and variance ln(Z^/x), respectively, and define d{x,z) and (a^ )(ij)g{o,...,fc}2 
by substituting (m,Il) to (m,S) in (5.1)-(5.2); then the coefficient £k is 
approximated by 



4 



xP 



-E 




Once the coefficients £k estimated for I < k < p + 1, the parameter 9 is 
chosen through a numerical minimization; see Remark 4.1. In the particular 
case of an uniform randomizing distribution (9 = 0), remark that only the 
estimation of £p+i is necessary. 

Therefore, the numerical procedure is divided in three steps: first, we esti- 
mate the terms detailed in the previous subsection Eg, £k, rh and Se through 
a Monte Carlo procedure with very few simulations. Then, we calibrate the 
parameter 9 by minimization and we deduce the exponential optimal theo- 
retical bandwidth. Finally, we estimate the delta of the option by means of 
a single-kernel based estimator with the estimated bandwidth. 



22 



R. ELIE, J.-D. FERMANIAN AND N. TOUZI 



Remark 5.1. The numerical effort dedicated to the calculation of the 
optimal bandwidth parameter h is also encountered in the classical finite 
differences method, as the optimal bumping parameter e involves some a 
priori numerical simulations. 

5.2. Numerical comparison of the estimators. We present here numerical 
results obtained for the estimation of the delta of an European and an Asian 
at-the- money digital calls, that is, with a payoff of the form (p{s) = 1s>k- 
Since this payoff function is discontinuous, the results of Section 4.5 show 
that the single-kernel based estimator achieves a better rate of convergence 
than the finite differences estimators, whenever the kernel has order p > 4. 
The main object of this section is to verify the empirical validity of these 
asymptotic results. 

In order to compare their behavior, each estimator has been computed 200 
times and their empirical distributions have been approximated by classical 
smoothing nonparametric estimation. 

Our numerical experiments are performed with the following values of the 
parameters: 

5-0 = 120, r = 0, (7 = 0.2, T = l and K = 120. 

We use the following polynomial kernel functions of order 2, 4 and 6, respec- 
tively, with support on [—1,1]: 

K4{u) = §{1 - u^){3 - 7u^), 

Kq{u) = i§(l - n2)(33M^ - 30u2 + 5). 

From the viewpoint of computing time, kernel based or finite differences 
estimations with the same number of simulations are comparable. All the 
numerical tests have been realized in Visual C++ on a Pentium 4 xeon 3 
GHz processor with 1 Gb of RAM. 

European digital call option. In the context of the Black-Scholes model, 
it was observed by [9] that the optimal weight for European options can 
be obtained by means of the Malliavin integration by part formula, and 
coincides with the likelihood estimator introduced by [3]. Therefore, we are 
not hoping to compete with the Malliavin-based Monte Carlo estimator. 

From our numerical experiments, we observed that the gain from using 
kernel estimators based on an exponential rather than a uniform random- 
izing distribution I was very poor, especially when the order of the kernel 
function increases. From a numerical viewpoint, the gain obtained at most 
counter-balanced the numerical price of the minimization procedure. The 
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examples presented here are therefore based on a uniform randomization 
distribution I. 

The distributions of the different estimators based on = 10^ simulations 
are reported in Figure 1. The good performance of the Malliavin estimator is 
confirmed by our numerical experiments. However, we observe surprisingly 
that the three kernel based estimators are less accurate than the centered 
finite differences one, although their numerical computing times are compa- 
rable, of the order of 2 seconds. According to Section 4.5, the kernel of order 
6 should perform better than the other ones, but this is not the case here. 
Actually, the terms Ce and Eg are such that the constant term of the mean 
square error increases very fast with the variability of K, which naturally 
increases with its order. For example, the MSE of the estimator based on the 
kernel of order 4 is ten times bigger than the one of the finite differences one, 
although they have the same rate of convergence. Furthermore, the optimal 
bandwidth h increases with the order of the kernel, so that the asymptotic 
approximations become less accurate. 

In order to further investigate this effect, we increase the number of sim- 
ulations. Figure 2 shows the distribution of the finite differences estimator 
and the kernel based estimator of order 6 based on = 10^ simulations 
where each simulation takes approximately 30 minutes on our computer. In 
this case, we observe that the kernel based estimator of order 6 truly out- 



K2 K4 K6 ^Malliavin FD - - - True value 




0,016 0,0161 0,0162 0,0163 0,0164 0,0165 0,0166 0.0167 0,0168 0,0169 

Fig. 1. Delta of an European digital call, N — 1 million. 
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performs the finite differences one: its bias and its variance are two times 
smaller. This confirms the theoretical asymptotic results obtained in Section 
4.5. We do not consider that the high number of simulations required is a se- 
rious restriction since it is just a matter of computer power or time given to 
the simulation. Furthermore, the good performance of the kernel based esti- 
mators of high order can be observed for a smaller number of simulations if 
we use in addition variance reduction technique. For example, by performing 
the antithetic variable technique with respect to the randomizing density 
we observe that the kernel based estimator of order 6 outperforms the finite 
differences estimator with 6 x 10'' simulations, corresponding to a computer 
time of about 2 minutes. 

Asian digital call option. We next investigate the case of an Asian option, 
where the Malliavin integration by parts formula does not lead to the optimal 
weight; see [9]. The distribution of the different estimators based on = 10^ 
simulations are reported in Figure 3, where the "true value" of the Greek 
has been approximated by an unbiased Malliavin estimation with a very 
large number of simulations. Even if the Malliavin weight is not optimal, 
the Malliavin estimator still outperforms the other estimators. As for the 
European digital call, the finite differences estimator outperforms the kernel 
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K2 K4 K6 ^Malliavin FD - - - 'True value" 




0,0277 0,0282 0,0287 0,0292 0,0297 

Fig. 3. Delta of an Asian digital call, N =1 million. 

based estimators, but one simply requires more simulations in order to make 
the kernel estimator of order 6 more efficient than the finite differences one. 

Conclusion (numerical results). Other tests realized with different pa- 
rameters, payoff functions or randomizing densities lead to rather similar 
results. Our kernel based estimator with order p > 4 of the delta of a dig- 
ital option outperforms asymptotically the finite differences one, but one 
requires a large number of simulation to verify this fact empirically. Never- 
theless, the high number of simulations required can be significantly reduced 
by means of variance reduction techniques. When the density of the under- 
lying is unknown and the pay-off function is irregular, the Malliavin based 
estimator is still more efficient than the others. Nevertheless, in general, 
Malliavin weights are very difficult to derive analytically and this is pre- 
cisely the advantage of the other estimators which are straightforward to 
implement. 
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